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Abstract 

We introduce a new model of linear regression for random func- 
tional inputs taking into account the first order derivative of the data. 
We propose an estimation method which comes down to solving a 
special linear inverse problem. Our procedure tackles the problem 
through a double and synchronized penalization. An asymptotic ex- 
pansion of the mean square prevision error is given. The model and the 
method are applied to a benchmark dataset of spectrometric curves 
and compared with other functional models. 

Keywords : Functional data, Linear regression model, Differential oper- 
ator, Penalization, Spectrometric curves. 



1 Introduction 



Functional Data Analysis is a well-known area of modern statistics. Advances 
in computer sciences make it now possible to collect data from an underlying 
continuous-time processe, say (&) t>0 , at high frequencies. The traditional 
point of view consisting in discretizing (£ t ) at t%,...,t p and studying it by 
classical multidimensional tools is outperformed by interpolation methods 
(such as splines or wavelets). These techniques provide the statistician with 
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a reconstructed curve on which inference may be carried out through what we 
may call "functional models" i.e. versions of the classical multidimensional 
models designed and suited for data that are curves. Thus, functional PCA, 
ANOVA or Canonical Analysis -even density estimation for curves or pro- 
cesses have been investigated. We refer to Ramsay, Silverman (1997, 2002), 
Bosq (2000), Ferraty Vieu (2006) for monographs on functional data anal- 
ysis. Recently many authors focused on various versions of the regression 
model introduced by Ramsay and Dalzell (1991) : 



where we assume that the sample ((yiXi) , (y n ,X n )) is made of indepen- 
dent copies from (yX) . Each Xi = (Xi (t)) te < 0iT i is a curve defined on the set 
[0, T] , T > 0, yi is a real number, e$ is a white noise and p is an unknown func- 
tion to be estimated. In other words the X^s are random elements defined 
on an abstract probability space and taking values in a function space, say 
T . The vector space T endowed with norm will be described soon. We 
refer for instance to Cardot, Mas, Sarda (2006) or Cai, Hall (2006) for recent 
results. 

In this article we study a new (linear) regression model defined below 
derived from and echoing the recent paper of Mas and Pumo (2006). The 
key idea relies on the fact that most statisticians dealing with functional data 
do not fully enjoy their functional properties. For instance in several models 
integrals such as 



are computed. The integral above is nothing but a scalar product. Never- 
theless derivatives were not given the same interest. Explicit calculations of 
derivatives sometimes appear indirectly in kernel methods (when estimating 
the derivatives of the density or the regression function) or through semi- 
norms or norms on T. But surprisingly X[ (or X^) never appear in the 
models themselves whereas people dealing with functional data often say 
that " derivatives contain much information, sometimes more than the initial 
curves themselves". Our starting idea is the following. Since in a func- 
tional data framework, the curve-data are explicitely known and not just 
discretized, their derivatives may also be explicitely computed. As a conse- 
quence these derivatives may be "injected" in the model, which may enhance 
its prediction power. The reader is referred to the forthcoming display (J3J) for 
an immediate illustration and to Mas, Pumo (2006) for a first article dealing 
with a functional autoregressive model including derivatives. 
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The paper is rather theoretic even if it is illustrated by a real case study. 
It is organized as follows. The next section provides the mathematical ma- 
terial, dealing with Hilbert spaces and linear operators, then the model is 
introduced. The next section is devoted to presenting the estimation method 
and its stumbling stones. The main results are given before we focus on a real 
case application to food industry. The last section contains the derivation of 
the theorems. 

2 About Hilbert spaces and linear operators 

Silverman (1996) provided a theoretical framework for a smoothed PCA. Jim 
Ramsay (2000) enlightened the very wide scope of differential equations in 
statistical modelling. Our work is in a way based on this mathematically 
involved article. We are aiming at proving that derivatives may be handled 
in statistical models quite easily when the space T is well-chosen. 

The choice of the space T is crucial. We have to think that if X e J 7 , 
X' does not necessarily belong to JF but to another space T' that may be 
tremendously different (larger) than T . We decide to take T = W 2 ' 1 , the 
Sobolev space of order (2, 1) defined by 

W 2 ' l = {ueL 2 [0,1]X GL 2 [0,1]} 

for at least three reasons : 

• If X E J 7 , X' E L 2 [0, 1] which is a well known space. 

• Both spaces are Hilbert spaces as well as 

W 2 ' p = {ueL 2 [0,l],« (p) eL 2 [0,1]}. 

This is of great interest for mathematical reasons : bases are denumer- 
able, projections operators are easy to handle, covariance operators 
admit spectral representations, etc. 

• The classical interpolation methods mentioned above (splines and wavelets) 
provide estimates belonging to Sobolev spaces. So from a practical 
point of view W 2,1 -and in general W m,p , (m,p) G N 2 , (see Adams and 
Fournier (2003) for definitions)- is a natural space in which our curves 
should be imbedded. 

In the sequel W 2,1 will be denoted W and W 2,0 = L 2 will be denoted 
L for the sake of simplicity. We keep in mind that W (resp. L) could be 
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replaced by a space of higher smoothness index : W 2 ' p where p > 1 (resp. 
W 2 ' p " 1 ). The spaces W and L are separable Hilbert spaces endowed with 
scalar product : 

(u, v) w = [ u(t)v it) dt+ f u' (t) v' (t) dt. 
Jo Jo 

(u,v) L = / u{t)v{t)dt 
Jo 

and with associated norms and We refer to Ziemer (1989) or 

to Adams and Fournier (2003) for monographs dedicated to Sobolev spaces. 
Obviously if we set Du = u' then D maps W onto L (D is the ordinary dif- 
ferential operator). Furthermore Sobolev's imbedding theorem ensures that 
(see Adams and Fournier (2003) Theorem 4.12 p. 85) that 

\\Du\\ L < C\\u\\ w 

(where C is some constant which does not depend on u) i.e. D is a bounded 
operator from W to L. This is a crucial point to keep in mind and the fourth 
reason why the functional space was chosen to be W 2 ' 1 : the differential 
operator D may be viewed as a continuous linear mapping from W to L. 

Within all the paper and especially all along the proofs we will need 
basic notions about operator theory. We recall a few important facts. A 
linear mapping T from a Hilbert space H to another Hilbert space H' is 
continuous whenever 

llTrll 

= sup < +oo. (2) 

The adjoint of operator T will be classically denoted T*. Some finite rank 
operators are defined by means of the tensor product : if u and v belong to 
H and H' respectively u <S>h v is the operator defined on H by, for all h G H : 

(u ®h v) (h) = (u, h) H v. 

Compact operators : Amongst linear operators the class of compact 
operators is one of the best known. Compact operators generalize matrix 
to the infinite-dimensional setting and feature nice properties. The general 
definition of compact operators may be found in Dunford Schwartz (1988) or 
Gohberg, Goldberg and Kaashoek (1991) for instance. By C H (resp. C HH <) we 
denote the space of compact operators on the Hilbert space H (resp. mapping 
the Hilbert space H onto H'). If T is a compact operator from a Hilbert space 
Hi to another Hilbert space H 2 , T admits the Schmidt decomposition : 

T = ^ fi k (u k <S) v k ) (3) 

ken 
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where Uk (resp. Vf.) is a complete orthonormal system in Hi (resp. in H 2 ) and 
fM k are the characteristic numbers of T (i.e. the square root of the eigenvalues 
of T*T) and 

lim Uk = 0. 

From (J2J) we obtain 

ll T lloo = SU P W- 
k 

When T is symmetric //& is the eigenvalue of T (then -u^ = v^). In this 
situation and from (jHJ) one may define the square root of T whenever T maps 
if ont H and is positive : T 1 / 2 is still a linear operator defined by : 

r 1/2 = X)V^K®0- (4) 

fcgN 

Note that finite rank operators are always compact. 

Hilbert-Schmidt operators : We also mention the celebrated space of 
Hilbert-Schmidt operators HS (Hi, H 2 ) - a subspace of Ch 1 h 2 - Let (ui) i>0 be 
a basis of Hi then T 6 if 2 ) whenever 

+oo 

^||T(OHh 2 <+oo. 

i=l 

The space 7^5 is itself a separable Hilbert space endowed with scalar product 

+oo 

(T, 5) W5 = ^ (T ( Mi ) , 5 K))^ 

8=1 

and (T, does not depend on the choice of the basis {ui) i>0 . Finally the 
following bound is valid for all T e HS : 

IITII < IITILc. 

II lloo — II II TiS 



Unbounded operators : If T is a one to one (injective) selfadjoint compact 
operator mapping a Hilbert space H onto H, T admits an inverse T _1 . The 
operator T" 1 is defined on a dense (and distinct) subspace of H : 

It is unbounded which also means that T" 1 is continuous at no point for 
which it is defined and ||T _1 |I = +oo. 
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3 The model 

We are now in position to introduce this (random input - linear) regression 
model : 

y i = (<l>,X i ) w + (ii),Xfi L +e i (5) 

where all random variables are assumed to be centered. The main result 
of the paper (see next section) gives an asymptotic expansion for the mean 
square prediction error in ©. 

The unknown functions <fr and ip belong to W and L respectively. 

Obviously we are going to face two issues : 

• Studying the identifiability of and if) in the model above. 

• Providing a consistent estimation procedure for <j) and tp. 
From now on we suppose that : 

Al : \\X\\ W < M a.s. 

This assumption could be relaxed for milder moment assumptions. We 
claim that our main result holds whenever 

A'l : E||X||^ < M. 

is true. But considering A'l would lead us to longer and more intricate 
methods of proof. 



4 Estimation procedure 
4.1 The moment method 

Inference is based on moment formulas. From Q we derive the two follow- 
ing normal equation -multiply with (Xi, •) and (X-, •) successively then take 
expectation : 

' 5 = r0 + rv, ^ 
5' = r'*<p + r>. { } 

where T, V, T'*, T" are the covariance and cross- covariance of the couple 
i x i, x l)i<i<n defined by : 

r = e (x ® w x) , r'* = e (x ® w X) , 
r' = e (x' ® L x) , r" = e {x 1 ® L X) , 
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and 

S = E (yX) eW, 5' = E (yX') e L. 

Under assumption Al or A'l the covariance operators belong to TiS (W), 
HS(W,L), TCS (L,W) or to TCS(L). Besides the covariance and cross- 
covariance mentioned above are linked through the relation 

r'* = DT, r" = DV. 

Resolving the system (jSJ is apparently easy but we should be aware of 
two facts : 

• Operators (here, r, T'...) do not commute ! 

• The inverse operators of T, and T" do not necessarily exist and when 
they do, they are unbounded, i.e. not continuous (recall that T, and T" 
are compact operators and that compact operators have no bounded 
inverses) . 

Before trying to solve (jBJ) we will first study identifiability of the unknown 
infinite dimensional parameter (0, ip) e W x L in the next subsection. We 
complete our definitions and notations first. 

We start from a sample (^Q,^Ui<i<n- B Y r n , T' n , T'*, T", 5 n and 5' n we 
denote the empirical counterparts of the operators and vectors introduced 
above and based on the sample (yi, X(, X'^) 1<i<n . For example : 

1 n 

r„--Vi t %4 (7) 

n z — ' 

k=l 

T 'n = \jlX' k ® L X^ 
k=l 

j 71— 1 
5 n = T y^.VkXk. 

n — 1 

4.2 Identifiability 

Both equations in (JSJ) are the starting point of the estimation procedure. We 
should make sure that solutions to these equations are well and uniquely 
defined. Suppose for instance that Kerr ^ {0} and take h in it. Now set 
cj) — (j> + h. Then 

T(f) = Y<p + Yh = F<p. 
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So T(f) = F(f) and since T'* = DT it is plain that T'*(f) = T'*(f). Consequently <p is 
another solution to (jUJ). There are indeed even infinitely many solutions in the 
space 0+Kerr. For similar reasons about ip we should impose KerT = {0} 
for T = {T, T', r'*, r"} . It turns out that the only necessary assumption is 

A2 : Kerr = Kerr" = {0} . 

It is easily seen that A2 implies Kerr' = Kerr'* = {0} . With other words 
we suppose that both operators T and T" above are one to one. 
We are now ready to solve the identification problem. 

Proposition 1 The couple (cf),ip) a W x L is identifiable for the moment 
method proposed in (QJ) if and only if A2 holds and (0, ip) ^ M where M is 
the vector subspace ofWxL defined by : 

Af = {(4>,ij) : <P + D*ij = 0}. (8) 

The above Froposition is slightly abstract but (JHJ) may be simply rewrit- 
ten: (<f), ip) G jV whenever for all function / in W, 

f (/0 + W + U) = o 

Note that M is a closed set in W x L. From now on we will assume that 

A3:(0,^A/-. 



5 Definition of the estimates 

The estimates stem from (JHJ) which is a non invertible system. Under as- 
sumption A2 the solution exists and is unique : 



^ = (r - rv'-^r 1 [5 - r'r"-M'] 
tj} = (r" - r rt r- 1 r / ) -1 [5' - rr- 1 ^ 



(9) 



Let us denote 

S r _ p/p/z-ip* 



5^ = r" - r'T- 1 ^. 

The reader should note two crucial facts. On the one hand T _1 and T"" 1 are 
unbouded operators but closed graphs argument ensure that TT" -1 ^ and 
p'*p-i(j exist in W and L respectively. On the other hand 5 — rT" -1 ^' (resp. 
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5' — T'*T 1 5) belong to the domain of the unbounded operator 1 (resp. 
S^ 1 ) which also ensures the finiteness of both solutions given in the display 
above. 

Finding approximations to the solutions of © is known in the mathemat- 
ical literature as " solving a linear inverse problem" . The book by Tikhonov 
and Arsenin (1977) -as many other references therein- is devoted to this the- 
ory well-known in image reconstruction. The unboundedness of S^ 1 may 
cause large variation of x even for small variations of x. This lack of 
stability turns out to damage, as well as the traditional "curse of dimension- 
ality" , the rates of convergence of our estimates. 

Unfortunately we cannot simply replace "theoretical" operators and vec- 
tors by their empirical estimates because T n and are not invertible. Indeed 
they are finite-rank operators (for example the image of T is span(Xi, X n )) 
hence not even injective. We are classically going to add a small perturba- 
tion to regularize T n and T'^ (see Tikhonov and Arsenin (1977)) and another 
one for S7 1 and make them invertible. At last T _1 is approximated by 

rt = (r n + ajy 1 , r"- 1 b y r£ = (r£ + ajy 1 and s- 1 by (s n>0 + pjy 1 

where 

c _ -p _ p/ r'* 

>Jn,4> L n 1 n \ L n ) 1 n ' 

and a n > 0, (3 n > 0. We also set : 

u n ,4> = 5 n - Y' n (r" f ) 5' n , (11) 
= 5' n - It (T+ ) 5 n . (12) 

In the sequel we will assume that both strictly positive sequences a n and 
f3 n decay to zero in order to get the asymptotic convergence of the estimates. 



Definition 2 The estimate of the couple (0, (0n>Vv ) based on (0|) and 
defined by : 

4>n = (S ni <f> + (3 n I)~ Unrf, 



(13) 

Ipn = (S n rf + (3 n I) 



The predictor is defined as 



Vn+i = ((j) n ,X n+1 j w + {%l) ni X' n+1/ 
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6 Main results and comments 



In Mas, Pumo (2006) the authors obtained convergence in probability for 
their estimates in a quite different model. We are now in position to assess 
deeper results. Mean square prediction error is indeed given an asymptotic 
development depending on both smoothing sequences a n and f3 n . 

Before stating the main result of this article, we give and comment the 
next and last assumption : 



A4 



(r 



I W 

4> 



< 



-oo 



< +00 



(14) 



For the definition of r _1//2 and T"~ l l 2 we refer to (J3J). Let us explain 
briefly what both conditions in (fT4*|) mean. To that aim we rewrite the first 
by developing r~ 1,/2 in a basis of eigenvectors of T, say u p 



r -i/2. 



E 

p=i 



(4>,u p ) 



hence 



-1/2, 



|2 

\w 



+00 

E 

P =i 



,u p y 



A, 



The first part of assumption A4 tells us that " (0, u p ) should tend to zero 
quickly enough with respect to A p ". In other words should belong to an 
ellipsoid of W which may be more or less "flat" depending on the rate of 
decay of the A p 's to zero. Assumption A4 is in fact a regularity condition 
on functions and tp : function (resp. ip) should be smoother than X 
(resp. X'). 



We could try and state convergence results for 
it turns out that : 



and ip n separatedly but 



• The real statistical interest of the model relies on its predictive power. 
The statistician is mainly interested in y n +i, not in n and ip n in a first 
attempt. The issue of goodness of fit tests (involving and ip alone) is 
beyond the scope of this article. 

• Considering the mean square norm of ^0 n ,X„ +1 ^ (instead of 0„ or 

even of (0 n ,x) for a nonrandom x) has a smoothing effect on our 
\ / w 
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estimates and partially counterbalance the side effects of the underly- 
ing inverse problem as will be seen within the proofs (especially along 
Lemma ITljl . 

Turning to y n+ i, the next question is : what should we compare y n+ i 
with ? The right answer is not y n +\. Obviously we could, but it is also plain 
that, due to the random e n+ \ the best possible prediction for y n +\ knowing 
X n+ i (or even the "past" i.e. X\, ...,X n ) is the conditional expectation : 

Vn+x =E(y n+ i|Xi,...,X„ +1 ) = ((f),X n+1 ) w + (t/j,X' n+1 ) . 

We are now ready to state the main theoretical result of this article. 

Theorem 3 When assumptions Al — A4 hold the following expansion is 
valid for the prediction mean square error : 

E(^ + .-y; +1 ) 2 = og) +0 (^) 

Remark 4 Replacing with y n+ \ is still possible. We may easily prove 
that : 

E (y n +i - y n +if = E (y n+1 - y* n+1 ) + o 2 £ . 

Corollary 5 From Theorem^ above an optimal choice for (3 is (3* x n -1 / 4 , 
then the convergence rate is : 

and may be quite close from 1/n 1 / 2 . 

The proof of the Corollary will be omitted. Studying the optimality of 
this rate of convergence over the classes of functions defined by A4 is beyond 
the scope of this article but could deserve more attention. 

Remark 6 Originally the linear model is subject to serious multicolin- 
earity troubles since X' n = DX n . Even if the curve X' n usually looks quite 
different from X n , there is a total stochastic dependence between them. The 
method used in this article to tackle this problem (as well as the intrinsic 
"inverse problem" aspects related to the inversion of the covariance operators 
T and Y") is new up to the authors 7 knowledge. As it can be seen through 
above at display / fi^j) or in the proofs below, it relies on a double penalization 
technique first by the index a n then by f3 n linking both indexes in order to 
suppress the bias terms asymptotically . 
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Centered original spectra of four samples (256 measures) 
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Figure 1: Centered original spectra of four samples (256 measures) 



7 An application to spectrometric data 

In this section we will present an application of the Functional Linear Re- 
gression with Derivatives (FLRD) introduced in this paper to a spectroscopic 
calibration problem. Quantitative NIR (near-infrared) spectroscopy is used 
to analyze food and agricultural materials. The NIR spectrum of a sample 
is a continuous curve giving the absorption, that is log 10 1/R where R is the 
reflection of the sample, against wavelength measured in nanometers (nm). 

In the cookie example considered here the aim is to predict the percent- 
age of each ingredient y given the NIR spectrum x of the sample (see Osborne 
et al. (1984) for a full description of the experiment). The constituents under 
investigation are: fat, sucrose, dry flour, and water. There were 39 samples 
in the calibration set, sample number 23 having been excluded from the orig- 
inal 40 as an outlier, and a further validation set with 31 samples, again after 
the exclusion of one outlier. 

An NIR reflectance spectrum is available for each dough. The original 
spectral data consists of 700 points measured from 1100 to 1498 nm in steps 
of 2 nm. Following Brown et al. (2001) we reduced the number of spectral 
points to 256 by considering only the spectral range 1380-2400 nm in step of 
4 nm. Samples of centered spectra are plotted in Figure [T] 
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A classical tool employed in the chemiometric literature for the prediction 
of y knowing the associated NIR spectra (xj,j = 1, . . . ,256) is the linear 
model: 

y= J2 9 J x J + e ( 15 ) 

j=l,256 

The problem then is to use the calibration data to estimate the unknown 
parameters 6j. Clearly in this application since 39 <C 256 the ordinary least 
squares fails and many authors proposed to use alternative methods to tackle 
the problem: principal component regression (PCR) or partial least squares 
regression (PLS). We invite the reader to look at the paper of Frank and 
Friedman (1993) for a statistical view of some chemiometrics regression tools. 

Following an idea of Hastie and Mallows, in their discussion of Frank and 
Friedman's paper, we consider a spectrum as a functional observation. The 
functional Linear Regression (FLR) corresponding to the model [H)] defined 
above is: 

y = j r x (t)6(t)dt + e 

where y is a scalar random variable, x a real function defined on 5 = 
[1100,2400] and 9{t) the unknown parameter function. Brown et al. (2001), 
Ferraty and Vieu (2003), Marx and Eilers (2002) or Amato et al. (2006) used 
such a model for a prediction problem with spectrometric data. 
The model FLRD introduced in this paper can be written as: 

y= x(t)(f)(t)dt + x'(t)ip(t)dt + e 

J 8 J 5 

where <j)(t) and ip(t) are unknown functions (see display (0) for an equiv- 
alent definition). In this paragraph we compare the performance of PCR, 
PLS, FLR, FLRD, Spline Smoothing model proposed by Cardot, Ferraty 
and Sarda (2006) and Bayes wavelet predictions proposed by Brown et al. 
(2001). 

We used the calibration data set for the estimation of parameter functions 
<j)(t) and ip(t) and validation data for calculation of the MSEP (Mean Squared 
Error of Predictions): 

1 31 

MSEP = -Y,{y 1 -i) 1 ) 2 

where y~j is the prediction of yj obtained by the model with estimated pa- 
rameters. The choice of the parameters a and (3 is crucial for the prediction 
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model. We used a cross-validation approach based on the evaluation of the 
standard error of prediction CVMSEP: 

1 39 1 38 

i=i 

where yj(i;a,/3) denotes the prediction of y? in the calibration set without 
sample i. Results for different methods of prediction of four ingredients are 
displayed in Table [TJ We used B-spline basis (k = 100) for obtaining predic- 
tions with Spline Smoothing, Spline Ridge RLF and Spline RLFD methods. 
For each of those methods we give the values of the smoothing or penalty 
parameters based on an analogous cross-validation approach. 







MSE Validation 




Method and parameters 


Fat 


Sugar 


Flour 


Water 


PLS 


0.151 


0.583 


0.375 


0.105 


PCR 


0.160 


0.614 


0.388 


0.106 


Spline Smoothing (k n = 8) 


0.546 


0.471 


2.226 


0.183 


Spline Ridge FLR (0 = 0.00002) 


0.044 


0.494 


0.318 


0.087 


Spline FLRD (a = 0.07, (3 = 0.15) 


0.092 


0.450 


0.332 


0.069 


Bayes Wavelet 


0.063 


0.449 


0.348 


0.050 



Table 1: MSEP criterion for all models (see Brown et al. for results of PLS, 
PCR and Bayes wavelet methods). 

We note that functional approaches work better then PLS or PCR meth- 
ods for the four predicted variables with respect to MSEP criterion. Our 
simulation, as noted also by Marx and Eilers (2002), show that functional 
methods lead to more stable prediction. The Spline FLRD method produces 
in general equivalent results in terms of predictions with the best methods 
presented in table [T] 



8 Proofs 

In the sequel M and M' will stand for constants. 

Let S and T be two selfadjoint linear operators on a Hilbert space H, we 
denote T <^ S whenever for all x in H, {Tx, x) < (Sx, x) then HT]^ < ||5'|| 00 . 

The norm in the space L 2 (B) where (B, \\-\\ B ) is a Banach space is defined 
the following way : let X be a random element in the Banach space B, then 

= ( E \\ x \\b) 1 
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When the notation is not ambiguous we systematically drop the index B i.e 
:\\X\\ L ^(E\\Xf B y< 2 . 

8.1 Preliminary facts : 

In order to gain some clarity in the proofs and to alleviate them we first list 
a few results stemming from operator or probabillity theory. 

Fact 1: If T is a positive operator (either random or not), T + 7/ is 
invertible for all 7 > with bounded inverse and || (T + 7/)~ 1 < 7" 1 . 
Hence 

= ||r^|| = ||r"t|| = a - 1 (16) 



Fact 2: As a consquence of assumption Al and of the strong law of large 
numbers for Hilbert valued random elements (see Ledoux, Talagrand (1991) 
Chapter 7), 

T n -> T a.s. 

n— »+oo 

whenever T n = T n ,T' n ,T'*,T'^ (resp. T = r, V, r'*,r") since all theses ran- 
dom operators may be rewritten as sums of i.i.d. random variables. These 
sequences of random operators are almost surely bounded 

sup WTJ^ < M a.s. (17) 

n 

which also means that 

maxfsuplKll^supKIU < M' (18) 

since (for instance) 5 n = Y n <j) + T' n ip + e n where e n is again a sum of i.i.d 
random elements : 



1 

3n = — / J Xk£k 



We also set 



n 
k=i 



1 ;; 



e n = -J2 X * £k 



n 
k=i 



(see below for details). 

Fact 3: The Central Limit Thorem in Hilbert spaces (or standards re- 
sults on rates of convergence for Hilbert valued random elements in square 
norm) provide a rate in the L 2 convergence of several random variables of 
interest in the proofs. See for instance Ledoux, Talagrand (1991) or Bosq 
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(2000) . Whenever T n = T n ,T' n ,T'*,T^ (resp. T = r, V, T'*, T") we have 



E \\T„ - T\ 



HS 



O (-) hence 



\T n -T\ 



o 



(19) 



since all theses random operators may be rewritten as sums of i.i.d. random 
variables. 

We begin with proving Proposition ^ 
Proof of Proposition [l] : 

The method of the proof may be adapted from the model studied in Mas, 
Pumo (2006). The couple (<p,ip) will be identified whenever, for any other 
couple (4> a ,^a), if 

5 = r0 + r> = r0 a + rV a , 
5' = r*<p + r> = r'*0 a + r>«- 

(4>a, i>a) — (0>V ; )- This will be true if 

' r (<p - (j> a ) + V (V - Va) = o, 
r'* (0 - a ) + r" (^ - ^) = o. 

This means that the couple (0 — a , ip — V'o) belongs to the kernel of the 
linear operator defined blockwise on If x L by : 

r r' 

As r'* = DT and T" = DT', the Proposition will be proved if the blockwise 
operator defined on W x L and with values in W : 

( r r ) = ( r yd* ) 

is one to one. It is plain that the kernel of this operator is precisely the space 
Af that appears at display (jSJ). 

This finishes the proof of the Proposition. 

The next two general Propositions are proved for further purpose. 
Proposition 7 



sup 

n 


(r"n 1/2 r'* 

\ n J n 


< Af a.s 

oo 


sup 

n 


n \ n ) 


< Af a.s 

oo 


sup 

n 


(r" f ) 1/2 r'* 


< Af, 

oo 


sup 


p/ ^p//t^/2 


< Af. 



16 



Proof. We prove only the first bound since the method may be copied for 
the other ones. Set R n = DY l J 2 then : 

-pit p p* 

(ra 1/2 c = (^;+«/)' 1/2 ^ry 2 . 

At last, 
It is plain that 



< 



(R n R* n + aiy 1/2 R r , 



n Woo 



sup 



lry 2 || <M a.s. 



If the Schmidt decomposition of R n is : 

Rn = ^ ^k,n (Uk,n <£> Vk,n) , 



ken 



(uk, n £ W, Vk, n £ L) it is simple algebra to get : 



(R n R* n + aiy 1/2 R n = J2 



f^k,n 



(Uk,n <8> V k ,n) 



which yields 



(R n R* n + aiy l/2 R r , 



sup fe 



< 1. 



Proposition 8 



^+/3/)- 1 L<^. 



(20) 



Proof. The proof of this Lemma is similar to Lemma 7.4 in Mas, Pumo 
(2006). It was then proved for S instead of S n and all operators should 
be changed to their empirical counterparts (e.g : T n insted of T). We give 
a sketch of it. The proof relies on the Schmidt decomposition of S n . One 
would get 

S n = Tl/ 2 A n (a) Y]! 2 

1 /2 

where A n (a) and T n are symmetric positive operators, which implies that 
S n itself is positive. It suffices then to apply Fact 2 (see the "Preliminary 
facts" subsection) to get the desired result. ■ 
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8.2 Outline of the proof of Theorem [3] : 

The following bound is valid : 

[vn+i - (((f),x n+1 ) w + <^,x; +1 ) L )] 2 
~ \ 2 / ^ i x 2 

- 6, X n+1 ) +(ip-ip,X n 



< 2 



L n+1 



Then 



E (6-6, X 



n+l 



E 
E 
E 



E(6 — 6,X, 



n+l 



E 



X, 



n+l 



W 
2 



, . . . , X n 



w 



Similarly, 



EU-^,X' 



n+l 



pl/2 



E 



r" 1/2 (V - $) 



Both preceding equations feature similar expressions. We focus on the term 
involving 6 ; we will prove that : 



E 



T 1 ' 2 6- 



O 



O 



1 



a 2 (3 2 n 



Within the proof the reader will easily be convinced that the method would 
lead to an analogous result for the term with From now in order to alleviate 
notations we drop the index 6 in S n ^ and u n ^. The sequences (a n ) n€N and 
(Ai) ne N wm b e denoted a and (3 respectively and for short. 
We start from 

0„ = (r n - r;rK + piy 1 (s n - r' n r'«5' n ) 
= (s n + piy 1 u n 

6=(T-T'T"iT'*y 1 (<5-rT'V) 
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where we recall that 



u = 5- rrv, 

u =6 -V T' n 6' 





The proof relies on the following decomposition : 



K - <P = (s n + piy 1 K -u) + {(s n + (3I)- 1 - s- 1 ) u 

= (S n + (3I)' 1 (u n it) + (S n + pi)- 1 (S-S n - (31) S~ l u 




(21) 



where 



B, 



A 



a 



n 



ri 



(S n + (31) 1 (U n ~ U) 

(S n + (3I)- l (S-S n )4> 
(3(S n + (3ir 1 <j ) 



(22) 
(23) 
(24) 



Along the forthcoming Lemmas we determine rates of convergence for these 
three terms. We will prove that the rate of decrease to zero in L 2 norm 
is (a(3\/n) for A n and B n . The rest of the proof of the main Theorem is 
postponed to the end of the next and last subsection. 

8.3 Proof of the main Theorem 

The first Lemma gives a reta of convergence for S n — S. 
Lemma 9 The following holds : 





Proof. First of all by : 




We focus on 
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Then dealing with each of these three terms separatedly we get 

Hp' p//fp/* -p/yv/f-p/* 1 1 ^ up/ p/n 1 1 p//fp/* 1 1 



/* 

x i n I 1 oo 



<K-i1U(r?)||J|r 
lir -r'li 



a.s. 



a 



The last bound was derived from (fTTj) and (jTHJ). 

||rT?(r^-r*)L 



^ II n 1 II oo 



a.s. 



a 



At last, 



r' (if - r"t) r'* = r'rf (r" - O r"tr'* 
= rTJ (r" - O r^r* 
= (r' - r'j r? (r" - r») r"tr' 

' n n \ n) 



Then, 



||r (r^-r^) r 7 *^ 

< IKr'-r^r^^-r^rtr 



+ 



1/2 



1/2 



1/2 p/* 



By Proposition [3 the second term may be bounded by 



C 



(r^V'-O (r"t) 



//f\l/2 



L2 



a\/n 



since 



|p//fl/2| 



|p//fl/2| 



a 



-1/2 



Cauchy-Schwartz inequality yields for the first : 

E||(r'-r;)r^(r"-r^)r"tr'*||^ 
<M(E||(r'-r'jr^|| 4 oo E||(r"-r^)r"t 



|4 
I oo 



1/2 



< M 



n 2 a 4 



hence 



r - K) K ] (r" - K) r"tr'*|| = o L , 

The proof of Lemma El is finished. ■ 
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Lemma 10 We have : 



Proof. We start with 



u, 



u = Ol2 



awn 



u„. - u 



1 



Clearly 5 n — 5 = 0^ — f= and we study the second term 



KK^' n - rW = (r; - V) v^5' n + r' (r? - r»t) 5; 
+ r'r"t (5; - 5') . 



Since is almost surely bounded (see (fl8|) ). — T = 0^2 
1 



L 2 



and I r"t 



Ir'^l 



= a 1 we get : 

(r-nr^ L = 0l ,(-i=) 



r'r'"K-i') 



L 2 



The remaining term is 

r' (r? - r"t) ^ = r'r"t ( r" - r^; 

= rT"t (r» _ rj) (r' n *0 + r> + <j , 
= r (r"t) 1/2 ( mi + m2 + ms ) 

where 

^i = (r"t) 1/2 (r"-0 (r;;t) 1/2 (r?) 1/a C0, 
m 2 = (r"t) 1/2 (r"-Or:t r >, 
m 3 =(r"t) 1/2 (r"-r:)rlH. 

First we drop V (r"t) since the norm of this operator may be bounded by 
a constant independent from a (see Proposition Ej). We turn to : 



||mi||<M||(r"-r 



n/ II co 



Il"l2|| < 



(r"») 



(r?) 1/2 
Il(r"-r;i)|| 
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since ||r^r^|| < 1 almost surely. The consequence of the display above is 

ll m i||/-2 = O ( — = ] and ||m 2 ||r2 = O ( —=] . 

We can deal with 777,3 as was done within the proof of the preceding Lemma 
Clearly we may cope with 777.3 as if the random T'£ was replaced by the 
non random V"'. We should study 



(r'*) 1/2 (r- 0(1^) 1/34 (v"^) irz e' n 



//f\l/2" 



1/2 



It is enough to get a rate of decrease for each of the these terms. Once again 
we have : 



(r"t) 1/2 (r"-r") (r" 1 ") 



,/f\l/2 



(r"t) 1/2 , 





' 1 ) 


= 


L' 2 






{}-) 


= 




\ Jan / 



which completes the proof of Lemma El ■ 

Now we are ready to go back to (|2*2*|) and (j2Sj) as announced sooner. 



Lemma 11 We have : 



A n = L 2 
B n = L 2 



1 



a(3JH 
1 

a[3Jn 



\ OQ \\ u n U\\ w 



Proof. Since 

by Lemma ITUl and Proposition |H1 we get the first desired result ■ 

Once again the proof of the second relies on Proposition |H1 and Lemma El 
Indeed 



\B n \\ w < ||(5' n + /3/)- 1 || oo ||5-5 ri | 



vv 



< 







S Sri 



hence the result. 

We should deal with the last term. In a first step we prove that S n may 
be replaced by S. 

Lemma 12 When a(3\/n — ► +00, 

C n = (3(S + (3iy l <p{l + o(l)). 
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Remark 13 The preceding equality should be understood with respect to the 
L 2 norm. 



Proof. Successively, 

C n = (5{S n + (5I)- 1 ct> 

= p ({s n + piy 1 -(s + piy 1 ) <p + p(s + piy 1 
= [({s n + piy 1 (s - s n )) +i]p{s + piy 1 



and 



C B || < \\P (S + piy 1 <f>\\ (1 + \\(S n + PI)' 1 (S - S n )\\J . 



Now it suffices to apply Lemma [TT] to get the desired result. ■ 

The next Lemma may be hard to understand at first glance. Within the 
forthcoming proof of Theorem El the bias term C n will slightly change. We 
refer to displays (J2HJ) and (j2H|) below for a deeper understanding. 

Lemma 14 The following holds : 



\t^ 2 (s + piy 1 v 1 / 2 \\ oo = o (^j 



Proof. Once again it takes two steps to get the result. First note that 
T 1 / 2 (S') _1 r 1 / 2 is a bounded linear operator. Indeed 

S = T- T'T'^T'* = T 1/2 A a r 1/2 (25) 

where R = DT 1 / 2 , 

A a = I-R* (RR* + aI)- 1 R. 
The Schmidt decomposition of R is (see fl20l) above for the empirical version) 

R = ^2^k{u k ® v k ) . 

km 

where {uk) km (resp. (f/t) fcgN ) is a complete orthonormal system in W (resp. 
L). Hence : 



fceN 

a 



ET- — ( u k <8> u k ) 
ui + a 

fceN ^ k 
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The operator A Q has a bounded inverse 



fcGN 



and HA" 1 !!^ = 1 + (sup/i 2 ,) /a < M/a for M large enough (or a small 

enough) . 

Hence 

r l/2 (iS) -l pl/2 = r l/2 r -l/2 A -l r -l/2 r l/2 = A -l_ (26) 

Now (second step) we prove that : 

v 1 ' 2 {s + (3iy l v 1 ' 2 < r 1 / 2 ^- 1 ! 1 / 2 . 

Let us pick a given z in , then 

(r 1 / 2 (s + /3/)- 1 r 1 /^, x) w = ((s + piy 1 r 1/2 x, r 1 /^) w 

It suffices to get for all y in in the domain of operator T -1 / 2 : 

((S + (3I)- 1 y,y) w <(S- 1 y,y) w (27) 

Standard results on the spectrum of (S + f3I)~ S prove that (S + S > 

and that + (31) 1 S\\ < 1 which is enough to claim (|2*7jl . 
We are now in position to finixh the proof of the Lemma. It is plain from 
(|2I|) that 

||r 1 / 2 (5 + /3/)- 1 r 1 / 2 || < llr 1 / 2 ^)- 1 ^/ 2 !! = IIa- 1 !! <- 



00 a 



which is the claimed result. ■ 

Proof of Theorem 

Now starting from (J2TJ) we get 



T^L-P) 2 <M\\Y 1 ' 2 {A n + B n + C n 
V / w 11 

<M(p n || 2 y + || J B n || 2 y + ||r 1 / 2 c , n 



w 

|2 

Iw 



:28) 



Lemmas HU gives the rates of convergence for HAiU^ and ||5 n ||^ respec- 
tively. But Lemma ED is unfortunately not enough to get a rate in the last 
term. However this previous Lemma enables to focus on : 

(3T 1/2 (S + f3I)~ l <J) = (3T 1/2 (S + (3I)' 1 T l/2 T- l/2 <p (29) 
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and 

\\r 1/2 c n \\ 2 w < \\v^ {s + r v»|£ ||r-vv|| w . (30) 

By assumption A4, 1 1 T — 1 / 2 <^> 1 1 ^ is finite. We deal with the central term, 
namely : 

r V2 (£ + ^J)"! pl/2 = r V2 ( r V2 Aar l/2 + ^J)- 1 pi/2 

<<r i /2 (r 1 /2 AQr i/2)- 1 r i/2 = A -i. 

(see and 

IK- 1 ||L = 0(a-). 
Collecting this last display with (jHUj) we get 

Hr^cUll = o (i? 



w \a 2 
This finishes the proof of Theorem El 
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